String-Tree Correspondence Grammar: A Declarative Grammar Formalism For Defining The Correspondence Between Strings Of Terms And Tree Structures
نویسنده
چکیده
The paper introduces a grammar formalism for defining the set of sentences in a language, a set of labeled trees (not the derivation trees of the grammar) for the representation of the interpretation of the sentences, and the (possibly non-projective) correspondence between subtrees of each tree and substrings of the related sentence. The grammar formalism is motivated by the linguistic approach (adopted at GETA) where a multilevel interpretative structure is associated to a sentence. The topology of the multilevel structure is 'meaning' motivated, and hence its substructures may not correspond projectively to the substrings of the related sentence. Grammar formalisms have been developed for various purposes. Generative-Transformational Grammars, General Phrase Structure Grammars, Lexical Functional Gr-mmar, etc. were designed to be explanatory models for human language performance, while others like the Definite Clause Grammars were more geared towards direct interpretability by machines. In this naper, we introduce a declarative grammar formalism for the task of establishing the relation between on one hand a set of strings of terms and on the other a set of structural representations a structural representation being in a form amenable to processing (say for translation into another language), where all and only the relevant conten~.s or 'meaning' (in some sense adequate for the purpose) of the related string are exhibited. The grammar can also be interpreted to perform analysis (given a string of terms, to produce a structural representation capturing the 'meaning' of the string) or to perform generation (given a structural representation, to produce a string of terms whose meaning is captured by the said structural representation). It must be emphasised here that the grammar writer is at liberty (within certain constraints)to design the structural representation for a given string of terms (because its topology is independent of the derivation tree of the grammar), as well as the nature of the correspondence between the two (for example, according to certain linguistic criteria). The grammar formalism is only a tool for expressing the structural representation, the related string, and the correspondence. The formalism is motivated by the linguistic approach (adopted at GETA) where a multilevel in~rpretative structure is associated to a sentence. The multilevel structure is 'meaning' motivated, and hence its substructures may not correspond projectively to the substrings of the related sentence The characteristic of the linguistic approach is the design of the multilevel structures, while the grammar formalism is the tool (notation) for expressing these multilevel structures, their related sentences, and the nature of the correspondence between the two. In this paper, we present only the grammar formalism ; a discussion on the linguistic approach can be found in [Vauquois 78] and [Zaharin 87]. For this grammar formalism, a structural representation is given in the form of a labeled tree, and the relation between a string of terms and a structural representation is defined as a mapping between elements of the set of substrings of the string and elements of the set of subtrees of the tree : such a relation is called a stringtree correspondence. An example of a string-tree correspondence is given in fig. I.
منابع مشابه
Natural Languages Analysis in Machine Translation (MT) Based on the STCG (String-Tree Correspondence Grammar)
The formalism is argued to be a totally declarative grammar formalism that can associate, to strings in a language, arbitrary tree structures as desired by the grammar writer to be the linguistic representation structures of the strings. More importantly is the facility to specify the correspondence between the string and the associated tree in a very natural manner. These features are very muc...
متن کاملAlternating Regular Tree Grammars in the Framework of Lattice-Valued Logic
In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued) regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کامل8 Open - Closed String Correspondence in Open String Field Theory
We address the problem of describing different closed string backgrounds in background independent open string field theory: A shift in the closed string background corresponds to a collective excitation of open strings. As an illustration we apply the formalism to the case where the closed string background is a group manifold.
متن کاملDeductive Systems and Grammars: Proofs as Grammatical Structures
During the last fifteen years, much of the research of proof theoretical grammars has been focused on their weak generative capacity. This research culminated in Pentus’ theorem, which showed that Lambek grammars generate precisely the context-free languages. However, during the same period of time, research on other grammar formalisms has stressed the importance of “strong generative capacity,...
متن کامل